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Abstract 

In  this  paper  we  propose  a  method  for  blind  signal  decomposition  that  does 
not  require  the  independence  or  stationarity  of  the  sources.  This  method,  that 
we  consider  a  simple  instance  of  non-linear  projection  pursuit,  is  based  on  the 
possibility  of  recovering  the  areas  in  the  time-frequency  where  the  original  sig¬ 
nals  are  isolated  or  almost  isolated  with  the  use  of  suitable  quotients  of  linear 
combinations  of  the  spectrograms  of  the  mixtures. 

We  then  threshold  such  quotients  according  to  the  value  of  their  imaginary 
part  to  prove  that  the  method  is  theoretically  sound  under  mild  assumptions 
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on  the  mixing  matrix  and  the  sources.  We  study  one  basic  algorithm  based  on 
this  method. 

The  algorithm  has  the  important  feature  of  estimating  the  number  of  sources 
with  two  measurements,  it  then  requires  n  —  2  additional  measurements  to  pro¬ 
vide  a  reconstruction  of  n  sources.  Experimental  results  show  that  the  method 
works  even  when  several  shifted  version  of  the  same  source  are  mixed. 
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1  Introduction 


Independent  Component  Analysis  can  recover  signals  that  are  linearly  mixed 
with  an  unknown  mixing  matrix.  All  algorithms  are  essentially  based  on  some 
local  learning  rule  (see  [L]  and  references  therein,  but  also  [QKS]).  This  proce¬ 
dure  is  effective,  but  it  suffers  from  the  need  to  assume  that  sources  are  inde¬ 
pendent  and  stationary.  A  different  approach  is  taken  in  [CC],  where  sources 
are  assumed  to  be  independent  and  non-stationary  and  only  time-delayed  cor¬ 
relations  of  the  observations  are  used  to  recover  the  mixing  matrix.  None  of  the 
previous  methods  can  handle  the  case  of  mixtures  of  sources  and  their  echoes, 
since  clearly  a  source  and  its  shifted  versions  are  not  independent. 

In  this  paper  we  suggest  an  algorithm  that  requires  a  different  set  of  assump¬ 
tions  on  the  sources.  This  algorithm  allows  to  estimate  the  number  of  sources 
given  at  least  two  mixtures  and  we  show  that,  if  there  are  additional  observa¬ 
tions  so  that  the  total  number  of  mixtures  is  equal  to  the  number  of  sources,  a 
full  reconstruction  algorithm  is  possible. 

More  specifically  let  xi  =  a\S\  +  ...  +  ansn,  x-2  =  biSi  +  ...  +  bnsn  be  the 
two  mixtures  of  n  real- valued  discrete  sources  sj,  i  =  1,  ...,  n  with  o<,6*  G  1R. 

Compute  the  spectrograms  of  x\  and  xn,  say  Ah  and  Ah,  where  by  spec¬ 
trograms  we  mean  the  complex-valued  matrices  of  windowed  discrete  Fourier 
transforms. 

Clearly,  if  we  denote  the  spectrograms  of  s  -,  by  5,;,  we  have: 

Ah  =  aiSi  +  ...  +  anSn,  AT  =  b\S\  +  ...  +  bnSn. 
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Let  R  be  a  non-singular  2x2  real- valued  matrix  that  we  call  exploratory  matrix, 
and  consider  the  quotient 

Rjh^X^tw)  +  R(l,2)X2(t,io) 

,W)  R(2,l)X1(t,w)  +  R(2,2)X2(t,w) 

where  t  is  the  time  coordinate  and  w  the  frequency  one. 

We  use  Qr (t,  w)  to  find  regions  in  the  time-frequency  plane  where  sources  are 
isolated  or  almost  isolated.  This  in  turn  reduces  the  search  for  the  unmixing 
matrix  (up  to  left  multiplication  by  a  diagonal  matrix)  to  the  solution  of  an 
underdetermined  system  of  linear  equations. 

The  algorithm  does  not  require  the  sources  to  be  independent  or  stationary, 
but  rather  it  relies  on  geometrical  separation  conditions  on  the  spectrograms 
of  the  sources.  In  essence,  two  related  data  sets  of  dimension  two  (Ah  and 
AY)  are  projected  onto  a  one  dimensional  space  through  the  non-linear  function 
Z  =  y1  +  r(2’^)  xl  ’  therefore  we  can  view  the  underlining  method  as  a  type 
of  non-linear  projection  pursuit  in  which  the  choice  of  the  exploratory  matrix 
determines  the  specific  non-linear  projection  of  interest  (  see  [H]  for  an  extensive 
treatment  of  projection  pursuit). 

The  second  section  of  this  paper  introduces  the  basic  idea  and  we  introduce 
the  definitions  and  tools  needed  for  our  ’’quotient  projection”  algorithm  .  An 
important  point  of  this  section  is  the  understanding  that  the  imaginary  part 
of  the  quotient  Qn(t,w),  a  simple  measure  of  ’’phase  locking”  between  the  two 
measurements,  can  be  used  to  increase  the  probability  of  finding  areas  where 
signals  are  isolated. 
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Section  3  presents  the  main  steps  of  the  algorithm  and  in  it  we  discuss  the 
limits  of  the  method. 

The  fundamental  role  of  separation  of  sources  in  time  frequency  domain  to 
achieve  reconstruction  was  already  underlined  by  Rickard  and  collaborators  in 
[RD],  [RBR],  [RY],  as  we  became  aware  of  in  the  final  stages  of  our  work.  In 
this  paper  we  stress  the  use  of  suitable  thresholding  of  the  imaginary  part  of 
Qn{t,w)  as  important  in  proving  the  theoretical  soundness  of  the  method.  The 
possibility  of  choosing  the  most  efficient  exploratory  matrix  is  also  emphasised 
here  in  line  with  the  idea  of  choosing  the  best  non-linear  projection. 

2  Quotients  Projections 

We  need  to  state  several  conditions  to  assure  that  Qn(t,w)  is  an  effective  tool  in 
detecting  sources.  We  start  with  a  condition  on  the  coefficients  in  the  mixtures 
x\  and  x-2- 

Condition  (1):  Assume  that  ^  ^  when  i  ^  j. We  call  the  slope  of 

the  source  s*. 

Denote  by  3'(/)  and  3 ?(/)  the  imaginary  and  real  parts  of  a  complex  function 
/.  Note  that  if  Sj(to,wo)  ^  0  for  a  single  i  =  i\  at  a  given  (to,wo)  then 

R(l,l)ai1Si1{to,w0)  +  R(l,  ‘2)bi1Si1  (tp,  wp)  _  IH\.  1  in  +  R{l,2)bh 
o,ioo)  -  R{2^)aiiSll(to,io0)  +  R(2,2)bilSil(to,wo)  ~  i?(2,  +  R(2,  2)bh 

and  therefore  '3(Q R(to,wo))  =0. 

Thus,  we  can  approximately  identify  the  regions  of  the  time-frequency  plane 
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where  the  different  sources  are  isolated,  by  retaining  only  the  elements  of  the 
matrix  Qii(t,w)  that  have  imaginary  part  very  near  to  zero.  Clearly,  this  is  a 
necessary  condition  to  be  verified  for  points  (t,  w)  where  the  sources  are  isolated, 
but  it  is  not  sufficient. 

Let  T(t,  w)  =  i  we  enforce  $t(QR(t0,w0))  «  0  asking  that  \T(t0,w0)\  < 

e.  This  is  a  computationally  simple  way  to  make  sure  that  the  relative  mag¬ 
nitude  of  the  imaginary  part  is  taken  in  account  rather  than  the  absolute  one, 
note  that  if  3i(Qi?(to, »o))  =  0  then  T(ta,w o)  is  not  defined,  on  the  other  hand 
we  will  see  in  the  following  discussion  that  a  point  for  which  5R(Q^(t0, wo))  = 

0  and  ^s{QR,{to,Wo))  ~  0  can  be  ignored.  The  case  »o))  >>  0, 

^(Q R(to -Wo))  0  may  lead  to  \T\  <  e  too,  but  this  case  is  unlikely,  if  at 

(to,  wo)  there  are  several  non-zero  sources  This  claim  will  be  made  precise  in 
lemma  2.1.  Our  choice  of  T(t,w)  is  clearly  not  the  only  possibility,  any  other 
continuous  function  T(t,w)  such  that  \T(t,w)\  =  0  implies  |3'(Qfl(t,  w))|  =  0 
would  be  suitable. 

Let  us  call  Q(R,e)(t,w)  the  function  obtained  by  thresholding  Qn(t,w)  in 
the  following  way: 

Q{fl,£)(tw)  =  QR(t,w )  if  I^TTT^TT— rrl  <  6  ,Q(R,t)(tiw)  =  0  Otherwise. 

Ideally  we  claim  that  the  distribution  of  the  values  of  3?(<3(i?,e)),  is,  as  e  goes 
to  zero,  made  of  several  delta  functions  of  different  weigth  centered  at  0  (due  to 
the  regions  of  the  time  frequency  domain  where  there  is  no  contribution  from 
any  signal)  and  at  the  value  of  the  quotients  qqj(i)  =  ;  i  =  l,,--,  n 
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that  we  can  assume  finite  for  a  generic  choice  of  R. 


Therefore  the  number  of  ’’dominant”  peaks  (see  remark  3.3)  of  the  value 
distribution  of  the  non  zero  values  of  3? (Q(R,e))  gives  us  generically  an  estimate 
of  the  number  of  sources  and  their  positions  will  give  the  values  of  the  qR(i)’s 
and  therefore  also  the  values  of  the  slopes  ^f-’s. 

Definition  2.1  We  call  silence  a  positive  area  region  in  time  frequency  domain 
where  both  Ah  and  AT  are  identically  zero. 

The  restriction  to  non  zero  values  of  3?(Q(r,£))  makes  the  case  in  which 
qnfi)  =  0  for  some  i  degenerate,  since  the  removal  of  the  zero  values  would 
also  remove  the  contribution  of  signal  s,  to  the  value  distribution.  A  generic 
perturbation  of  R  avoids  this  problem,  in  section  3  we  indicate  how  to  reduce 
the  chance  of  choosing  such  degenerate  exploratory  matrix. 

We  assume  throughout  this  section  that  R  is  choosen  so  that  qn{i)  is  finite 
and  non-zero  for  all  i  =  1, ...,  n. 

The  possibility  to  separate  sources  using  our  idea  rests  on  the  following 
assumption: 

Condition  (2):  the  sources  s,,  i  =  1,  ...,  n  are  separated  in  some  regions  of 
our  chosen  time-frequency  representation. 

Remark  2.1:  Any  complex- valued  frame  that  achieves  this  objective  for 
the  class  of  sources  that  we  are  interested  in  would  be  suitable  for  the  quotient 
signal  decomposition. 
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Figure  1:  Dominant  regions  as  defined  in  the  text  for  e  =  10  3  (top)  and  for 
e  =  10-2  (bottom) 

In  this  paper  we  perform  our  experimental  work  on  linear  mixtures  of  speech 
signals,  using  spectrograms  as  complex  linear  transformation,  therefore  it  is 
interesting  to  verify  to  which  extent  is  condition  (2)  true  for  this  class  of  signals. 
As  figure  1  shows,  given  two  speech  signals  si  and  s o  on  a  time  interval  of  0.61 
seconds,  the  region  in  time-frequency  domain  where  we  have  <  e  i  ^  j, 
i,  j  =  1,2  is  marginal  for  small  e  (e  =  10-3),  but  it  is  sizable  when  we  consider 
higher  values  (e  =  10-2). 

This  suggests  that  we  must  develop  the  theory  in  the  context  of  small,  but 
not  insignificant,  perturbations  that  can  arise  where  each  source  is  dominant. 
Such  perturbations  can  be  caused  by  noise  or  by  low  energy  contribution  by 


the  other  sources,  as  in  this  case,  therefore  the  analysis  of  the  problem  should 


not  require  a  specific  knowledge  of  the  probability  distribution  of  the  pertur¬ 
bations  therefore  a  non-parametric  approach  to  the  theoretical  stability  of  the 
algorithm  introduced  in  this  paper  is  needed.  For  the  time  being  we  consider 
the  ideal  case  in  which  signals  are  truly  separated  to  build  a  simple  version  of 
our  algorithm.  To  prove  that  this  basic  algorithm  is  theoretically  sound  we  need 
to  give  some  mild  conditions  on  the  probability  distribution  of  the  sources  in 
the  transformed  domain  to  make  sure  that  the  contribution  to  the  value  distri¬ 
bution  of  3 i(Q( /?,<=))  due  to  values  of  (t,  w )  where  we  do  have  mixtures  of  sources 
is  minimal.  Certainly  the  following  condition  has  to  be  satisfied: 

Condition  (3):  Sources  must  be  linearly  independent  on  positive  measure 
regions  of  the  spectrogram,  i.e.  given  any  positive  area  region  B,  we  cannot 
find  real  numbers  (p i,  ...,pn)  such  that  p\S\(t,w)  +  ...  +  pnSn(t,w)  =  0  for  all 
(t,  w)  €  B. 

If  condition  (3)  is  not  verified,  we  can  have  degenerate  situations  in  which 
ghost  sources  are  detected.  Assume  for  example  that  Sj  =  pSj,  p  constant  for 

some  i  and  j  in  a  region  M ,  and  that  this  dependence  happens  in  some  region 
where  the  contribution  of  other  signals  is  marginal.  Then  we  have  that,  on  M: 

_  R(  1,  1  )<<i ;Sj  +  ajSj)  +  R(l,2)(bjSi  4-  bjSj)  _  R(  1, 1  )(a,;  +  paj)  +  R(  1, 2 )(bj  +  pbj) 
jR  ~  R( 2,  l)(a,;5,:  +  ajSj)  +  i?(2,2)(6iSi  +  bjSj)  ~  R( 2,  1 1 (>/,  +  /»/,  I  +  R(2, 2)(6<  +  pbj) 

The  slope  of  a  ’’source”  that  does  not  exist  as  physical  entity  would  be 

detected  and,  the  quotient  projection  algorithm  would  need  n  + 1  measurements 
to  give  complete  reconstruction  of  the  n  physical  sources  and  the  virtual  one. 

On  the  other  hand  it  is  reasonable  that  sources  can  be  very  similar  in  some 
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cases  (think  of  Gregorian  chant). 

Thus  it  is  expected  that  there  will  always  be  limit  cases  that  lead  the  al¬ 
gorithm  into  detecting  ghost  sources.  Our  own  auditory  system  is  not  immune 
from  illusions. 

We  can  enforce  (3)  assuming  that: 

Condition  (3’)  S,;,  i  =  1, ....  n  are  spatially  distributed  realizations  of  ran¬ 
dom  complex  variables  S,  with  supports  Supp(Sj)  not  totally  overlapping,  i.e. 
there  exist  positive  area  regions  Bt  C  Supp(Sj)  such  that  Bt  does  not  belong  to 
any  Supp(Sj)  for  j  i. 

In  other  words  we  fix  the  geometrical  supports  and  we  assume  that  the  values 
of  Sj  on  each  value  of  ( t,w )  G  Supp(Sj)  is  a  realization  of  the  corresponding 
random  variable  Sj. 

Let  now  Z  =  f)Supp{Sj ),  V  =  (J  Supp(Si),  XSi  =  Supp(Sj)\[U(Supp(Sj)  n 
Supp(Sj))],  j  i  and  X  =  (J XSi- 

Remark  2.2  For  simplicity  let  Z\ JX  =  V ,  the  following  discussion  would 
work  even  if  there  are  regions  where  only  the  support  of  some  sources  overlap, 
at  least  when  the  number  of  sources  is  finite.  Moreover  note  that  silence  in  the 
time  frequency  domain  is  not  included  in  V. 

From  condition  (2)  we  know  that  X  is  of  positive  measure. 

Denote  S',;  =  3i(S,;)  +  iQ(S,;)  and  let  A  =  a.i5R(Si)  4-  ...a„3i(S,i),  A  = 

ai3;(Si)  +  ■■■0'n'Z(Sn),  Mi  =  6ilR(Si)  +  •••  +  6?l3i(S,i),  dX>  =  biQ(Si)  +  ...  + 

bnZ(Sn). 
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As  random  variables,  let  Fsr  =  SR(Qh)  and  Fq  =  ,  where  we  drop  the 

explicit  dependence  from  (t,iu)  while  assuming  that  (t,w)  is  a  random  point 
uniformly  distributed  in  V. 

A  simple  computation  shows  that, 

^  =  S(Qb)  =  '/M.l  /b2.2  //M.2:/A2.  I  •  /’..!/,  /A A/, 

3  T(P1,M1)B(P2,M2)  +  T(P2,M2)B(P1,M1) 

where  .1/,)  =  F(1,1)P,:  +  i?(l,2)M,:  and  B(PuMi)  =  R{2, 1)P,:  + 

i?(2,2)M,, 

To  prove  the  following  lemma  and  theorem  we  need  one  more  technical 
condition: 

Condition  (4):  the  joint  probability  density  function  of  P,  and  M,:,  i  =  1.2 
is  a  continuous  function. 

Let  fx  be  the  underlining  probability  density  function  of  the  values  of  Fj  \x 
on  a  region  X .  Then: 

Lemma  2.1  If  conditions  (1),  (2),  (S’)  and  (4)  are  satisfied,  then  the  proba¬ 
bility  density  function  /z>j£  of  Fg  knowing  that  |Fq|  <  e  converges  to  a  delta 
function  centered  at  the  origin  as  e  goes  to  zero.  More  specifically  fx>, o  =  fx- 

Proof:  Condition  (4),  and  the  fact  that 

Sq  =  {(M1,M2,P1,P2)  |  4 (Mi,  Mo, Pi,  F2)  =  q} 

is  a  set  of  measure  zero  in  1R 4  for  every  q  G  1R,  tell  us  that  fz  is  a  non-atomic 
probability  density  function.  As  regards  fx,  we  know  it  is  centered  at  the  origin, 
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since  signals  are  isolated  on  X  and  we  expect  F%  =  0  for  all  values  of  (t,w)  G  1, 
therefore  fx  is  a  delta  function  centered  at  the  origin. 

The  probability  density  function  that  we  observe,  before  imposing  the  thresh¬ 
olding,  is  /x>  =  A\(x))  fz  +  \(v)  fa  where  A(*)  is  the  area  function.  After  imposing 
that  \F%\  <  e  we  observe  the  new  probability  density  function: 

aA(Z)  A(l) 

/X>’£  oA{Z)  +  A(X yZ’"  <jA(Z)  +  A(I)JZ 

where  a  =  fz  and  fz,e  is  the  restriction  of  fz  to  the  interval  [ — e,  e] . 
Since  a  converges  to  zero  as  e  goes  to  zero,  we  have  that  fv,o  =  fx- 

The  previous  lemma  shows  that  most  non  zero  values  of  Q(R,e)  will  likely  be 
in  the  regions  Is,  for  e  small  enough.  Therefore  we  have  the  following  theorem. 
Denote  by  SqR(i)  the  delta  function  centered  at  qnii),  then: 

Theorem  2.1  Under  the  same  conditions  as  lemma  2.1 ,  the  probability  density 
function  gv,o  of  F^  knowing  that  F%  =  0  is  E*  ypry-'W*)' 

Proof:  As  a  consequence  of  lemma  2.1,  F 3  (to,wo)  =  0  implies  with  probabil¬ 
ity  1  that  (t-o,  wo)  G  X,  which  in  turn  implies  that  FsR(to,wo)  G  {</#(  1),  ...,qn(n)}. 
The  probability  that  F^(to,wo)  corresponds  to  any  specific  one  of  the  (/«(*) ’s  de¬ 
pends  from  the  area  of  each  region  X,; .  More  specifically  the  probability  density- 
function  of  Fsn  \x  is  gx  =  E*  therefore  gv, 0  =  gx  =  E*  ^ySgR(i)- 

Note  that  the  interpretation  of  the  sources  as  spatially  distributed  random 
variables  (that  is  conditions  (3’)  and  (4)),  is  not  essential  to  our  method,  but  it 
gives  a  possible  theoretical  basis  to  show  why  the  independence  of  the  sources 
is  not  needed. 
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Example  1:  To  see  in  practice  the  basic  idea  in  the  ideal  setting  we  applied 
the  algorithm  to  the  mixture  of  five  speech  signals  that  were  each  set  to  zero  on 
some  small  not  overlapping  time  intervals  of  length  0.0061  seconds  to  make  sure 
that  conditions  (2)  and  (3)  were  satisfied.  This  ideal  setting  is  not  unlikely  in 
practice,  as  it  will  happen  if  only  one  of  the  speech  sources  is  active  at  some  given 
time  interval.  Let  si, S5  be  the  sources  and  set  xi  =  si  +  10so  +  1.4^3  +  s 4  + 
0.3s5,  x\  =  si+3s2  +  l-4s3+s4  +  0.3s5,  x-2  =  si  +  3.03so  +  1-03s3— 4.99s4  +  0.5ss, 

the  resulting  mixtures  are  observed  on  a  time  interval  of  1.22  seconds.  Con- 
T  0.5  1  1 

sider  the  choice  of  R  =  g  g  ^  (for  a  preliminary  analysis  on  how  to  choose 
R  see  next  section). 

The  true  values  of  the  qn(i)’s  are:  0.8333,  0.83)2,  0.80)6,  1.0715,  0.8783. 
Note  that  qr(  1)  and  qn(‘2 )  are  very  close  to  each  other. 

An  histogram  of  the  value  distribution  of  ift(Q(R,e))  selecting  e  =  10-2  is 
shown  in  figure  2  top  left,  a  detail  is  shown  in  figure  2  bottom  left. 

Similarly  figure  2  top  right  shows  the  value  distribution  with  a  choice  of 
e  =  10-3  and  we  see  a  detail  of  the  distribution  at  the  light  bottom. 

We  can  see  that  as  e  becomes  smaller ,  the  value  distribution  approximate  the 
sum  of  the  delta  functions  centered  at  the  qR(i)  ’s. 

Note  that  even  very  minor  separations  in  the  slopes  of  si  and  so  are  resolved 
in  this  ideal  situations  as  the  details  at  the  bottom  of  figure  2  show. 
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Figure  2:  Value  distribution  of  3 ?(Q/?,e)  and  detail  of  the  distribution  for  e  = 
10-2  (left)  and  for  e  =  10-3  (right) 

3  Cluster  Detection  Algorithm 


In  this  section  we  estimate  the  number  of  sources  given  two  observations  and 
we  lay  down  the  basis  of  the  reconstruction  algorithm.  As  we  justify  the  as¬ 
sumptions  and  heuristic  behind  our  method,  we  will  deduce  several  steps  of  the 
algorithm  that  will  be  labeled  as  (Al),  (A2)  and  so  on. 

First  of  all,  we  have  to  choose  the  matrix  R  in  such  a  way  that  the  peaks  of 
the  value  distribution  of  non  zero  values  of  3 ?(Q(R,f))  are  enhanced  when  they 
correspond  to  the  values  of  the  qii(i)’ s. 

If  conditions  (1),  (2),  (3)  are  fully  satisfied  and  there  is  no  noise,  then  any 
choice  of  a  non  singular  R  such  that  its  rows  are  not  orthogonal  to  any  of  the 
(a,:,6j)  is  suitable,  as  this  is  sufficient  to  assure  that  all  qn(i):s  are  finite,  such 
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choice  is  the  generic  case,  therefore  we  can  expect  that  almost  any  R  will  allow 
the  detection  of  the  sources. 

In  practice,  when  signals  are  not  fully  isolated  anywhere,  we  need  to  select 
R  so  that,  for  each  the  relative  contribution  in  the  numerator  and  de¬ 

nominator  of  Qii{t,w )  due  to  each  signal  is  as  big  as  possible,  since  we  do  not 
want  to  reduce,  with  our  choice  of  R,  the  area  of  the  regions  where  each  signal 
is  ’’dominant”. 

This  already  implies  that  the  direction  of  both  the  row  vectors  of  R  must 
be  ’’far”  from  the  direction  of  the  vectors  — a*),  i  =  1,  the  orthogonal 
vectors  of  the  (»/,-./>,•  ). 

The  specific  value  of  each  5,  will  change  from  point  to  point,  therefore  we  can 
only  try  to  optimize  the  contribution  of  each  direction  determined  by  (6,;,  a,:), 
i  =  1,  To  make  such  statement  rigorous,  let  <:,• .  i  =  1,  ...,n  be  unit  vectors 
parallel  to  (6*,  a*),  and  let  r\  =  (i?(l,  1),  i?(l,  2))  r2  =  (R(2, 1),  i?(2,  2)).  Con¬ 
sider  the  function  F(r)  =  V  .  .  C < , '  > I  —  I < , '  >  I )  wjlere  <  a  b  >  denotes  the 
inner  product  of  a  and  b. 

Then  the  choice  of  the  second  row  of  the  exploratory  matrix  can  be  reduced 
to  the  solution  of  the  following  minimization  problem: 

(a)  minr2F(r2),  \r2\  =  1. 

We  can  then  choose  n  to  be  any  slight  perturbation  of  r2,  such  that  \r\  —  r2  \  < 
c\vi  —  Vj\,  for  all  choices  of  i  ^  j  with  c  <<  1. 

Clearly  the  exact  directions  associated  to  the  (/;,,</,  I ‘s  are  not  available  as 
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they  are  what  we  are  looking  for,  so  in  general  the  choice  of  such  ’’optimal” 
matrix  cannot  be  determined  and  the  use  of  several  exploratory  matrices,  each 
enhancing  a  different  source,  is  needed.  This  procedure  can  be  done,  but  it 
would  complicate  considerably  the  algorithm. 

As  this  paper  is  meant  to  be  an  introduction  to  the  basic  ideas  behind  such 
techniques,  we  restrict  our  attention  to  the  case  in  which  all  (a*,  6,;)  are  in  the 
positive  quadrant,  as  this  case  correspond  to  the  most  relevant  applications  in 
speech  processing  in  which  the  coefficients  of  the  mixing  matrix  are  positive 
attenuation  coefficients  of  the  energy  intensity. 

Given  the  previous  restriction,  any  fixed  choice  of  R  such  that  iq,  ro  are 
properly  contained  in  the  positive  quadrant  does  assure  that  there  is  a  lower 
bound  on  <  Vi,rj  >,  j  =  1,2  for  any  possible  Vi  in  the  positive  quadrant,  this 
in  turn  gives  an  upper  bound  on  the  possible  value  of  F(rj)  in  the  optimization 
problem  (a).  Note  that  the  choice  of  R  =  ,  that  is  the  simple  quotient 

would  reduce  the  resolution  of  any  source  s,  such  that  (a,,,  &,;)  ?s  (1,0)  or 
( a,i,bf )  (0,1)  since  in  the  first  case  s,;  would  have  marginal  contribution  in 

the  denominator,  and  in  the  second  case  the  signal  would  be  marginal  in  the 
numerator. 

Definition:  Given  a  data  set  F,  let  Fg  be  the  histogram  of  the  values  of  F 
with  bin  size  /?.  A  measure  of  the  roughness  of  Fg  is: 

f,  [±(Fg(nP)-Fl3((n-  l),d))]2 

^mnm  +  iFpdn-mi+j 

J  is  a  parameter  that  has  the  effect  of  reducing  the  contribution  of  values  of  Fg 
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that  are  of  the  order  of  J.  We  can  see  in  I(F,/3 )  a  discrete  modified  version  of 
the  roughness  penalty  integral  J  used  by  Good  and  Gaskins  in  [GG]. 

With  a  slight  abuse  of  notation  denote  by  3?(Q(r,£))  the  data  set  given  by 
non-zero  values  of  w)  with  (t,w)  in  the  given  time  frequency  do¬ 

main. 

We  are  ready  now  to  write  down  the  first  step  of  our  algorithm: 

(Al)  Slope  Detection:  Consider  an  exploratory  matrix  Rq  with  positive 
rows  bounded  away  from  the  vectors  (1,0),  (0,1).  Compute  Q(R0,e)  for  some 
e.  Build  a  best  estimation  of  the  value  distribution  of  3f(Q(i?0,e))  choosing  the 
width  /j  of  the  bins  of  the  histogram  of  the  values  of  5R(Q(r0>£))  so  that  the 
roughness  index  I($l(Q (r0 iC)) ,  0)  is  minimized  for  ft  =  f3.  Detect  the  position  of 
the  qR0(i)’ s  (see  next  remark)  and  compute  the  corresponding  slopes  ^ .  The 
number  E  of  slopes  detected  is  our  estimation  of  the  number  of  distinct  sources 
( in  practice  two  sources  withh  very  close  slope  may  not  be  detected  as  distinct, 
see  example  2  and  the  discussion  that  follows  it). 

Remark  3.2:  Because  of  the  presence  of  J,  ’’large”  peaks  of  ^(Q(R0,e))j3 
will  be  quite  smooth.  In  practice  we  see  that,  when  the  speech  time  series  are 
sufficiently  long  (order  of  few  seconds  with  sampling  rate  of  8192  Hz),  a  choice 
of  J  ps  100  is  often  sufficient  to  smoothen  the  major  peaks.  The  smoothness 
of  the  main  peaks  is  actually  so  high  when  enough  data  are  used,  that  we  can 
detect  the  position  of  the  qR0(i)’ s  simply  by  the  following  procedure  that  detect 
’’large”  local  maxima: 
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Let  F  =  3?(Q(i?0i£))  and  consider  the  discrete  function  D(Fj3)  =  |Fg(n/J)  — 
Fp((n  —  1)/?)|.  Let  L  =  max  D{F~3)  be  the  maximum  local  displacement.  We 
assume  that  a  value  x  corresponds  to  a  true  qR(i)  if  Fp(x)  is  a  local  maximum 
and  if  we  can  find  y\  and  y-2  such  that  y\  <  x  <  y-2  with  \Fg{x)  —  F^(yj)\  >  <j>L , 
i  =  1,2  and  F~3(y)  <  Fj~3(x )  for  y  G  [yi,y-2\,  V  ±  x,  <p  »  1. 

This  strategy  would  work  only  if  the  smoothness  of  the  histogram  is  relatively 
uniform  on  its  domain,  otherwise  a  very  sharp  main  feature  can  produce  a  value 
of  L  that  is  too  large  for  less  pronunced  peaks. 

There  is  an  element  of  indetermination  in  the  choice  of  <f>,  we  want  at  least 
<t>  >  1,  but  one  may  need  larger  values  of  <f>. 

In  any  case  we  left  aside  this  issue  in  the  description  of  step  A1  since  there 
are  several  ways  to  choose  the  main  features  of  an  histogram  and  such  choice  is 
part  of  a  more  general  problem  than  the  one  treated  in  this  paper. 

We  can  now  identify  the  regions  where  the  identified  sources  are  isolated. 

(A2)  Cluster  Construction:  For  j  =  1  ,...n,  compute  functions  Qj  such 
that  Qj(t,  w)  =  Q(Ro^(t,w)  if  (L"\>)  ~  Qr0U)\  <  P  and  Qj(t,w)  =  0 

otherwise.  Let  Gj  =  Supp(Qj)  be  the  support  of  the  Qf  s. 

The  Gj’s  are  our  estimates  of  the  regions  of  the  time-frequency  plane  where 
the  Sj’s  are  isolated. 

Suppose  now  that  a  total  of  n  observations  X{  were  available  and  let,  to 
make  notation  uniform,  x  =  Ms  where  x  =  ( x3 ,  ...,x „)*,  s  =  (si,  ...,.sn)*  and  M 
is  an  n  x  n  invertible  matrix.  Note  that  for  each  point  in  the  time-frequency 
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domain  M~1X(t,w)  =  S(t,w)  where  X(t,w )  =  (Xi(t,w), Xn(t,w)Y  and 
S(t,w)  =  (Si(t,iu),  ...,Sn(t, w)Y .  Assume  that  E  =  n,  that  is,  assume  that  we 
are  able  to  identify  isolated  regions  in  time-frequency  domain  for  all  signals. 

Then,  given  values  such  that  (U,Wi)  £  G,:,  i  =  1  the  following 

equalities  hold: 

(i)  M~1X(ti,  Wi)  =  (0,0,  ...,Si(ti,w, ,0)*, 

since  at  each  (G,w*)  only  5;  is  non-zero. 

Each  of  the  equalities  (i)  for  i  =  1,  ...,  n  gives  some  conditions  on  the  coeffi¬ 
cients  of  M  1 .  therefore  we  can  now  write  the  last  two  steps  of  our  reconstruction 
algorithm. 

(A3)  Constraints  on  Inverse  Mixing  Matrix:  Choose  points  £ 

G,;,  i  =  l,...,k2,  let  (pi,...,pn)  be  the  rows  of  M-1,  for  each  pk  k  =  1  ,...,n 
consider  the  systems: 

[Ek)  {<  Pk,X{tuWi)  >=  0,  i  Y  £}• 

Find  non  zero  solutions  pk  of  (Ek).  Build  the  matrix  M-1  =  [pi,  ...,pn]. 

(A4)  Reconstruction  of  the  Sources:  Apply  M-1  to  (x\ , ...,  xnY,  then 
(si snY  =  M~1(xi,  ...,x„Y  is  our  estimate  of  the  sources. 

Each  system  (Ek)  is  an  underdetermined  system  of  n  —  1  equations  in  n 
unknows  (the  coefficients  of  each  pk).  Therefore  each  specific  solution  pk  of 
(Ek)  is  a  multiple  of  some  row  of  M_1  and  M~x  is  a  rescaled  permutation  of 
the  rows  of  M_1. 
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Figure  3:  Value  distribution  of  3?(Q/?,e)  and  detail  of  the  distribution  for  e  = 
10-2  (left)  and  for  e  =  10-3  (right). 

In  other  words  M-1  =  A S(M^1)  where  A  is  a  non-singular  diagonal  matrix 
and  S(M^1)  is  a  permutation  of  the  rows  of  M-1. 


We  mentioned  several  times  up  to  now  that  in  general  signals  may  be  dom¬ 
inant  in  some  regions  of  the  transformed  domain,  but  not  fully  separated.  Ex¬ 
perimental  work  shows  that  for  real  speech  signals  we  cannot  achieve  separation 
of  two  sources  if  their  corresponding  slopes  are  very  close  unlike  the  case  when 
there  are  fully  isolated  regions,  as  we  can  see  in  the  following  example: 

Example  2:  let  us  work  out  example  1  again  without  setting  artificially  the 
signals  equal  to  zero  on  small  windows. 

Figure  3  shows  that  the  histogram  of  the  value  distribution  with  optimal  bin 
size  does  not  allow  to  distinguish  (//?(!)  and  Qr(2)  for  e  =  10-2,  but  it  achieves 
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Figure  4:  Value  distribution  of  3?(Q/f,e)  and  detail  of  the  distribution  for  e  = 
10-2  (left)  and  for  e  =  10-3  (right)  when  the  intensity  of  the  sources  is  changed. 


some  separation  for  e  =  10-3  (see  detail  at  the  bottom  light). 

On  the  other  hand  the  same  analysis  in  which  si  and  so  are  replaced  by 
s\  =  4si  and  s'2  =  fails  to  show  any  discrimination  between  the  two  sources 
as  shown  by  the  histograms  in  figure  4-  Note  that  the  directions  of  the  slopes  of 
the  sources  was  not  changed. 


Remark  3.3:  Example  2  suggests  that  closeness  of  the  slopes  and  the  degree 
of  dominance  of  signals  in  small  regions  are  related  in  achieving  separation,  and 
we  wonder  to  which  extent  the  imaginary  part  threshold  can  help  in  enhancing 
the  resolving  power  of  the  algorithm.  Clearly  the  specific  choice  of  the  threshold 
is  a  crucial  factor  in  achieving  optimal  resolution.  But  the  possibility  of  using 
very  small  thresholds  is  limited  in  practice  also  by  the  finite  sample  size.  We 
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Figure  5:  Observations. 

need  in  this  case  to  control  e  so  that  the  optimal  roughness  index  I(^t(Q(R>e),P) 
is  always  below  some  given  constant.  Further  work  is  in  progress  on  these  issues. 

It  turns  out  that  the  previous  simple  algorithm  performs  well  also  for  real 
speech  signals  when  the  slopes  of  the  sources  are  well  separated. 


Example  3:  Consider  the  case  in  which  the  four  speech  signals  s i  ,  so,  s 3  and 

0.2,0.6,0.18,0.3 
0.2,0.22,0.25,0.5 
0.65,0.2,0.4,0.6 
0.5,0.99,0.3,0.4 

on  a  time  interval  of  one  second.  Let  so(t )  =  Si(t  +  r)  with  r  =  0.073,  that 


sa  are  mixed  with  the  mixing  matrix  M  = 


is  two  of  the  sources  are  shifted  versions  of  each  other.  We  show  the  resulting 
mixtures  Xi,  i  =  1, ...,  4  in  figure  5. 

Assume  we  know  that  all  coefficients  of  M  are  positive,  then  we  can  use 


the  exploratory  matrix  R  = 


2,7 

1,1 


and  apply  directly  step  A1  to  the  first 
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Figure  6:  Value  distribution  of  5R(Qr,£) 

two  mixtures  with  a  choice  of  threshold  e  =  10-2.  The  value  distribution  with 
optimized  bin  size  fj  =  0.0054  is  shown  in  figure  6,  in  figure  7  we  show  the  graph 
of  the  index  computed  for  decreasing  values  of  /3  =  A,  n  =  10, 1000. 

The  estimates  ofqn(i)’s  corresponding  to  the  major  peaks  of  the  histograms 
are  in  increasing  order,  3.3475,  4-5020,  4-0025,  5.1275.  The  true  values  are: 
3.3415,  4-5000,  4-0070,  5.1250,  the  relative  error  is  less  than  2  *  10-3  for  all 
sources.  Step  A2  gives  us  the  clusters  whose  details  are  shown  in  figure  8 

Before  applying  step  A3  we  preprocessed  the  clusters  so  that  we  retain  only 
the  non  zero  values  of  the  G,  ’s  that  are  in  small  rectangular  regions  where  there 
is  very  high  density  of  non  zero  elements,  since  we  expect  regions  where  each 
signal  is  isolated  to  have  positive  area. 

This  heuristic  adjustment  reduces  in  practice  the  chance  of  selecting  in  step 
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Figure  7:  Computation  of  the  roughness  index  i)  for  n  —  10...  1000 
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Figure  8:  Details  of  clusters  where  sources  are  isolated 
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Figure  9:  Estimates  of,  clockwise  from  top  left,  S4,  S3,  so  and  si 


A3  points  that  do  not  belong  to  positive  area  regions  where  signals  are  isolated. 


Then  we  apply  step  A3  to  randomly  chosen  points  in  these  dense  regions  of 

-1.8097,-1.8046,-1.6412,-1.8021 
0.8726,0.8553,0.7969,0.8327 
-0.4809,  -0.4771,  -0.5094,  -0.4594 
1.0000,1.0000,1.0000,1.0000 


the  clusters  Gj  and  we  get  M  1  = 


where  all  last  components  of  each  row  were  chosen  equal  to  one.  Finally  step 
Af  gives  the  reconstructions  shown  in  figure  9,  rescaled  so  that  they  have  unit 
l1  norm  . 

Compare  these  reconstructions  to  the  rescaled  original  sources  shown  in  figure 
10.  All  reconstructions  have  a  signal  to  noise  ratio  between  16  and  HZ  decibel. 

We  would  like  to  stress  that,  while  on  one  hand  the  choice  of  mixing  matrix 
is  somehow  non  degenerate  since  the  slopes  of  the  sources  are  ”farv  from  each 
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Figure  10:  Original  Sources,  clockwise  from  top  left:  si,  so,  s 3,  S4 

other  and  the  norm  ofa.jSi,  bjSj,  is  of  the  same  order  of  magnitude  for  i  =  1, 
on  the  other  hand  the  length  of  the  time  interval  used  for  our  analysis  is  only 
one  second  as  opposed  to  several  hundred  seconds  in  traditional  ICA  algorithms. 
It  seems  likely  that  a  long  time  interval  will  benefit  the  accuracy  of  our  algorithm 
as  well ,  since  a  long  time  interval  increases  the  chance  of  having  areas  where 
signals  are  almost  isolated  as  in  example  1  (for  our  case  study  of  speech  signals, 
different  people,  hopefully,  start  to  speak  at  different  times...). 

We  believe  that  the  quotient  projections  algorithm  is  only  the  first  step  of  a 
class  of  non-linear  projection  algorithms  that  make  full  use  of  the  interplay  of 
real  and  imaginary  parts  of  homogeneous  quotients.  In  a  forthcoming  paper  we 
discuss  such  more  general  quotient  projections  algorithms  and  we  expand  the 
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theoretical  basis  of  the  method. 
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